Skip to content

fix(mcp): prevent MCPClient from hanging on exit#1847

Closed
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/mcp-client-hang-on-exit-1732
Closed

fix(mcp): prevent MCPClient from hanging on exit#1847
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/mcp-client-hang-on-exit-1732

Conversation

@giulio-leone
Copy link
Contributor

@giulio-leone giulio-leone commented Mar 7, 2026

Bug

MCPClient.stop() hangs indefinitely when the background thread cannot be stopped, causing scripts to freeze on exit.

Reported in: #1732

Root cause

_background_thread.join() was called without a timeout. When stop() is triggered from Agent.__del__() (garbage collector) or during interpreter shutdown, the event loop may already be closed, so run_coroutine_threadsafe(_set_close_event(), loop) silently fails (raises RuntimeError). The background thread never receives the shutdown signal, and join() blocks forever.

Fix

  1. Add cleanup_timeout parameter (default 5.0 seconds) to MCPClient.__init__()
  2. Use join(timeout=cleanup_timeout) instead of indefinite join()
  3. Catch RuntimeError from run_coroutine_threadsafe when the event loop is already closed
  4. Log a warning when the thread doesn't stop in time, but proceed with cleanup anyway

Testing

  • Added test_stop_does_not_hang_when_background_thread_unresponsive — verifies timeout prevents hang
  • Added test_stop_handles_closed_event_loop — verifies graceful handling of closed event loop
  • Updated existing join() assertions to verify timeout parameter
  • All 54 MCP client tests pass

Fixes #1732

Documentation PR

Not required — this is an internal bug fix that adds a timeout parameter to MCPClient.stop(). The existing API surface is unchanged; the new cleanup_timeout parameter is optional with a sensible default.

@giulio-leone
Copy link
Contributor Author

Friendly ping — CI is green, tests pass, rebased on latest. Ready for review whenever convenient. Happy to address any feedback. 🙏

…ad is unresponsive

MCPClient.stop() called _background_thread.join() without a timeout,
causing the process to hang indefinitely when the background thread
could not be signaled (e.g., event loop already closed during
interpreter shutdown or finalizer invocation).

Changes:
- Add cleanup_timeout parameter (default 5.0s) to MCPClient.__init__
- Use timeout in _background_thread.join() to prevent indefinite hangs
- Catch RuntimeError from run_coroutine_threadsafe when the event
  loop is already closed
- Log a warning when the thread does not stop within the timeout

Fixes strands-agents#1732
@giulio-leone
Copy link
Contributor Author

Friendly ping — rebased on latest and ready for review. Happy to address any feedback!

@mkmeral
Copy link
Contributor

mkmeral commented Mar 13, 2026

/strands review

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

Issue: Missing "Documentation PR" section in PR description.

Suggestion: Please add a "Documentation PR" section to the PR description. Since this adds a new cleanup_timeout parameter with a sensible default (5.0s) and is primarily a bugfix, a brief justification like "Internal parameter for cleanup behavior - not user-facing configuration that needs documentation" would be acceptable.

If you believe the new parameter warrants user documentation (e.g., for users experiencing slow MCP server shutdowns), please link a documentation PR.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: Request Changes

Well-implemented fix for the MCPClient hanging issue. The approach of adding a timeout to join() and gracefully handling closed event loops is correct and follows the existing MCP client architecture patterns.

Review Summary
  • Code Quality: Clean implementation with proper type annotations, docstrings, and error handling
  • Testing: Comprehensive tests covering the regression case and edge conditions
  • Backward Compatibility: New cleanup_timeout parameter with sensible default preserves existing behavior
  • Documentation: PR description is missing a "Documentation PR" section - please add either a link to a docs PR or a justification for why documentation updates aren't needed

Once the Documentation PR section is added, this is ready to merge. 👍

@giulio-leone
Copy link
Contributor Author

The "Documentation PR" section has been added to the PR description ✅

Not required — this is an internal bug fix that adds a timeout parameter to MCPClient.stop(). The existing API surface is unchanged; the new cleanup_timeout parameter is optional with a sensible default.

Could the bot review be re-run or dismissed? Happy to address any other feedback.

@giulio-leone
Copy link
Contributor Author

@mkmeral Could you re-run /strands review? The previous review flagged a missing "Documentation PR" section, which has since been added to the PR description. Would love to get this unblocked. Thanks! 🙏

@giulio-leone
Copy link
Contributor Author

Hi — the ## Documentation PR section is already present in the PR description ("Not required — this is an internal bug fix..."). Happy to adjust the wording if needed!

@mkmeral
Copy link
Contributor

mkmeral commented Mar 16, 2026

It seems like we have merged in a fix last week for the given hanging issue. #1830

I will close this PR in favor of that one for now. That PR is a better approach as it fixes the underlying issue

@mkmeral mkmeral closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Managed MCPClient integration hangs on exit

2 participants